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Abstract Body 


Background / Context: 

When using time series accountability data to evaluate system-wide education policies, 
concurrent changes often pose threats to internal validity. The standard difference-in-differences 
(DID) method resorts to a non-equivalent comparison group whose average outcome change is 
due to such confounding. This strategy relies on the strong assumption that the average 
confounding impact of concurrent events is the same for the comparison group unaffected by the 
policy and the experimental group affected by the policy. This assumption will be violated and 
therefore the DID results will be biased, for example, if the confounding effect varies by 
individual characteristics and if the experimental group and the comparison group differ in such 
characteristics (Meyer, 1995). Prior research has typically employed a DID model with linear 
covariance adjustment for observed pretreatment characteristics (e.g., Bamow, Cain, & 
Goldberger, 1980; Card & Kruger, 1993; Dynarski, 2003; Fitzpatrick, 2008). More recently, 
researchers have attempted to equate the covariate distribution of the comparison group with that 
of the experimental group through propensity score matching or weighting before conducting 
DID analyses (Abadie, 2005; Blundell et al, 2004; Cerda, et al, 2012; Heckman, Ichimura, 

Smith, & Todd, 1998; Heckman, Ichimura, & Todd, 1997). Another approach is to nonlinearly 
estimate the distribution of the counterfactual outcome of the experimental group resembling the 
outcome change in the comparison group (Athey & Imbens, 2006). Each of these strategies 
invokes a set of strong assumptions that may not hold in a particular application. 

Purpose / Objective / Research Question / Focus of Study: 

We propose an alternative strategy that extends the Peters-Belson method (Belson, 1956; 
Peters, 1941) to the DID context. The use of prognostic scores (Hansen, 2008) in the causal 
inference literature can be viewed as the latest development of the Peters-Belson method. Our 
strategy involves a pair of prognostic scores per unit representing the predicted pre -policy 
outcome and the predicted post-policy outcome under the comparison condition in the absence of 
policy change. The difference between the two prognostic scores is the predicted amount of 
confounding attributable to concurrent events for each unit. Subsequent difference-in-differences 
analyses within subclasses defined by this pair of prognostic scores allow for a calibrated 
adjustment. Our rationale is to equate the predicted amount of confounding of concurrent events 
across the pre-policy experimental group, the post -policy experimental group, the pre-policy 
comparison group, and the post-policy comparison group within subclasses of units. We show 
that the average within-subclass DID estimate of the policy effect can be obtained through 
analyzing a weighted outcome model. 

This study provides the theoretical rationale for the prognostic score-based DID strategy, 
clarifies its identification assumptions, and develops an analytic procedure. The new strategy is 
then extended to multilevel multi-cohort education accountability data. We illustrate with an 
evaluation of a policy adopted by the Chicago Public Schools requiring all ninth graders to take 
algebra. We define the causal estimand and develop statistical models for investigating whether 
the policy effect was enhanced as its implementation became mature or whether the effect faded 
out over time as the reform lost its momentum after the initial period. 

Setting / Intervention / Program / Practice: 
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The algebra-for-all policy adopted by the Chicago Public Schools (CPS) in 1997 was 
intended to eliminate remedial math courses for low-achieving students and thereby improving 
high school math achievement across the board. However, CPS students experienced a number 
of important policy changes during those same years. The impact of replacing remedial math 
with algebra was likely confounded by the impacts of those other concurrent interventions. 

Among the 59 neighborhood high schools in Chicago that existed both before and after 
1997, 45 schools offered remedial math to low-achieving students prior to 1997 and replaced 
remedial math with algebra after 1997; 14 schools offered algebra to all 9 th graders prior to 1997 
and thus were unaffected by the policy. This provides a possibility of using the DID strategy to 
remove the confounding of concurrent events. Unlike most DID studies in which individuals in 
the experimental group started to experience the policy at a certain time while those in the 
comparison group never experienced the policy, in this application, the policy had already been 
implemented in the comparison schools before it was adopted by the experimental schools. 

Significance / Novelty of study: 

A major challenge to DID analyses is that the confounding effect of concurrent events 
may vary by pretreatment covariates that are distributed differently across the experimental 
group and the comparison group. This study contributes to the literature by developing a new 
alternative DID strategy that makes use of prognostic scores to adjust for the confounding effect 
of concurrent events. This new strategy invokes assumptions that are distinct from and often 
weaker than most of the existing DID methods. In particular, by using a prognostic score-based 
weighting adjustment, the outcome model is non-parametric in nature and hence is exempt from 
strong model-based assumptions that make conventional DID analyses prone to bias. 


Statistical, Measurement, or Econometric Model: 

For simplicity, we start by focusing on the mean difference in the math outcome between 
a pre-policy cohort and a post-policy cohort, contrasting a hypothetical experimental school with 
a hypothetical comparison school. We will then extend the results to multiple schools and multi- 
cohort data. 

Notation. Let V) denote the math outcome of student i at the end of the 9 th grade measured 
on a continuous scale. Let = 1 if the student attended an experimental school affected by the 
policy; let Gj = 0 if the student attended a comparison school unaffected by the policy. Let 
7) — 1 if the student was enrolled in the 9 th grade during the post-policy year and 0 if the student 
was enrolled in the pre-policy year. Let X i denote a vector of covariates measuring student 
characteristics that are not caused by the policy. 

Causal estimand. We are interested in estimating the average policy effect for students 
attending the experimental school in the pre-policy year (i.e., the treatment effect on the 
untreated in the experimental group), considering the possibility that the policy could have been 

introduced in an earlier year. Let Y iG ^ T0 denote the potential outcome that student i would 
display if attending the experimental school and counterfactually having exposure to the policy 


in the pre -policy year; let T® ro denote the student’s potential outcome in the pre-policy year in 
the absence of the policy. Here the superscript indicates policy exposure while the subscript 
indicates school membership and cohort membership. The causal estimand is 


-’Gi.ro 


= E 


Ici.TO — = 1.1 = 0 . 
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Standard DID method and bias. The standard DID estimator is {£” [5^ | G = 1, T — 1] — 
E[Y\G — 1, T — 0]} — {E[Y\G — 0, T — 1] — E[Y\G — 0, T — 0]} and can be obtained through 
analyzing a linear model Y t — a 0 + afT t + /? 0 Gj + + e t . The standard method will 

generate a biased estimate of the policy effect if the average confounding effect of the concurrent 
events differs between the experimental school and the comparison school. 

Theoretical rationale for prognostic score-based DID. Across all four G-by-T groups of 
students, we attempt to identify a subpopulation of students defined by X — x who would 
experience the same amount of confounding if attending the comparison school. We have 

Cl) Cl) 

observed T G 0 T1 of post-policy students in the comparison school and T co ro of pre-policy 

students in the comparison school. Prognostic score model specifications will allow us to predict 

this pair of potential outcomes for all students. We define i po°(X) and i pi°(X) as the predicted 

pre-policy and post-policy outcomes respectively of a student if assigned to the comparison 

school, which are henceforth denoted by xp 0 and x respectively. Within each homogeneous 

subpopulation defined by xp 0 and x/j 1 , DID analysis is expected to generate an unbiased estimate 

of the policy effect of interest. 

( 1 ) 

Identification assumption. Suppose that xp 0 and x are based on true models for Y G0T0 

(l) Cl) Cl) 

and f G0 T1 respectively. Then we have that f Go ro 1 X\xp 0 ,xp 1 and f G0 T1 1 X\xp Q ,xp 1 . Our key 

identification assumption is that 

e[y!£ t1 \G = 1,T = 1 ,Mi\-e[y^ T0 \G = 1 ,T = o^o^i] 

= e[y™ t1 \G = 0 ,T = 1,^0,^] -e[y$ to I G = 0 ,T = 0,^,^]. 

The above assumption implies that: (a) xp t — f(x, t ), for t = 0,1 defines the function for the 
counterfactual outcome under the comparison condition at time t regardless of one’s actual 
treatment group membership; (b) the support for the observed covariates X in the comparison 
school, denoted by X 0 , encompasses the support in the experimental school, denoted by X ± . A 
proof in Appendix B1 shows that, conditioning on xp 0 and xp 1 , the policy effect S G1T0 can be 
estimated without bias under these assumptions. 

Analytic procedure. We specify a prognostic score model for the comparison school 
students in the pre-policy year and a second prognostic score model for the comparison school 
students in the post-policy year. We then apply these two models to all students in all four G-by- 
T combinations and predict a pair of prognostic scores xp 0 and xp 1 for every student. To conduct 
DID within cells jointly defined by xp 0 and xp 1 , we may divide the sample into three strata on the 
basis of xp Q and then subdivide each stratum into three on the basis of xp { . We then conduct a 
standard DID analysis within each of the nine cells and pool the results to obtain an estimate of 
the policy effect. This procedure allows the DID estimate to differ across different levels of xp 0 
and xp 1 . Let D s for 5 = 1, ...,9 denote the nine cells. Through analyzing the model Y[ = afT^ + 
P 0 Gi + + Y?s=i* S iD si + O , we obtain /? x as an estimate of the average policy effect. This 

is equivalent to analyzing a weighted model Y t = a 0 + afTi + /3 0 Gi + ^Gfi + e i . Appendix B2 
shows the weights to be computed for estimating S G1T0 . The weighted model is relatively 
convenient to use in multilevel multi-cohort analysis. 

Extension to multilevel multi-cohort data. In multilevel data, a student’s potential 
outcome is a function of student-level pretreatment covariates X and school-level pretreatment 
covariates W. One may specify a pair of two-level prognostic score models with students at level 
1 and schools at level 2. In theory, a student might have multiple prognostic scores depending on 
which comparison school the student might have counterfactually attended. We define the 
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prognostic scores as a student’s predicted outcome of attending a typical comparison school, 
which can be viewed as the average of the school-specific prognostic scores for the student. In 
accountability systems in which repeated assessments of student academic achievement have 
been equated vertically, one may model the growth trajectories of students as well. Suppose that 
the data include three pre-policy cohorts denoted by t = —2, — 1,0 and three post-policy cohorts 
denoted by t' — 1,2,3. The average first-year policy effect on the math learning of pre -policy 

students attending experimental schools is defined as £?=- 2 E^Y^ — Y^^pr(t\G — 1). Here 

pr(t\G — 1) is the proportion of pre -policy students in experimental schools entering the ninth 
grade in year t. To investigate whether the policy effect may depend on the maturity of 
implementation, we may combine the results of multiple pair-wise DID analyses. Each DID 
analysis contrasts one pre-policy cohort in year t with one post-policy cohort in year t' and is 
based on the corresponding pair of prognostic scores xp t and xp t '. Let 7(t' — t) for t' = 1,2,3 be 
the indicator for the subset of data used in the DID analysis for estimating the policy effect after 
t' years of implementation. Let Z — 1 denote the post-policy years and 0 for the pre -policy 
years. A weighted outcome model for student i attending school j will be 

Yij — y Iij (t + <X lt 'Zij + t'Gij + "b Uj T j. 

Here /? 11? /? 12 , and /? 13 estimate the policy effects after one year, two years, and three years of 
implementation, respectively. 

Usefulness / Applicability of Method: 

The theoretical results and the analytic procedure presented above apply to a continuous 
outcome such as student achievement data as well as a binary outcome such as whether a student 
eventually graduates from high school. Various semi-parametric and non-parametric strategies 
can be employed in specifying the prognostic score models. Issues related to model 
misspecifications are beyond the scope of the current paper. However, by allowing the models 
for xp 0 and ip 1 to be different functions of A under the comparison condition, T and G each take a 
fixed value in a prognostic score model. Hence there is no need to consider T-by-X interaction, 
G-by-X interaction, T-by-G interaction, and T-by-G-by-X three-way interactions in any given 
model. Linally, the prognostic score-basis DID method does not preclude covariance adjustment 
in the outcome model for further bias removal and precision improvement. We demonstrate the 
usefulness of this new method through simulations and an application study. 

Conclusions: 

Past DID applications have relied heavily on model-based assumptions with regard to the 
temporal trend in the data in the absence of policy change. Applying the prognostic score-based 
DID strategy to multi-cohort data, we define the causal estimands non-parametrically and 
therefore do not impose a linear time trend. This new strategy greatly reduces the dimensionality 
of covariates for adjustment, which is a major advantage over the linear covariance adjusted 
DID. The stratification procedure enables researchers to detect heterogeneity in the confounding 
effects of concurrent events as well as in the policy effect. Yet like most other DID methods, this 
new strategy shows limitations when the experimental group and the conditional group differ in 
the distribution of an unobservable and when the amount of confounding of concurrent events is 
a function of the unobservable. Sensitivity analysis may be developed to assess the amount of 
bias associated with a possible unobservable covariate. 
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Appendix Bl. 


Here we prove that the DID estimator integrated over the joint distribution of the two 
prognostic scores ip 0 and xp 1 is an unbiased estimate of S G1T0 under the identification 
assumptions (8), (9a), and (9b). 


1 1 {D G i\ x p i X p o - D GO \ l p i X p o )f<Jpo\'Pi)f(.'Pi)dip 0 dip 1 

= ff aE[Y\G = l,T = l,i/> 1 ,i/> 0 ]-E[Y\G = l,T = 0,i/> ll ip 0 ]} 

-{E[Y\G = 0 ,T = l,xl) 1 ,xl) 0 ]-E[Y\G = 0 ,T = O,xp 1 ,xp 0 ]})f(xp 0 \xp 1 )f(xp 1 )dxp 0 dxp 1 

= ff {{b[y^ t1 \g = ij = i,^ 0 ]-e[y^ to \g = ij = o,4, 1 ^ 0 \} 


{^[^GOTlI^ 0,T = l,'/. 1 j/>„| ' ti>',!''rol 6 ' = 0,T = O,^ 1 ,^„]})/W>ol0i)/(^i)*M>/'i 
= ff {e[y£ to \G = 1,T = l.^o] 


+ 


>'®rol c = 1.7’= 0,./i,,./. !l |j/(./. ll |./i 1 )/(./. 1 )rf./) !l d./. 1 
ff (hhihiic = 1 J = l.V'i.V’o] - e[Ygi.to\G = 1.7’ = 0,V>i.Vo]} 


{ E y gSti\ g = °’ T = 1 ’ l Pi' x Po ~ E Y goto\ g = 0,T = 0 1 xp lt xp q }) f(xp 0 \xp 1 )f(xp 1 )dxp 0 dxp 1 


-’ci.ro- 
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Appendix B2. 


To obtain an estimate of the policy effect on the untreated pre -policy students in the 
experimental school 5 ciro , the DID estimate in each of the nine cells is to be weighted by the 
cell-specific proportion of pre-policy students in the experimental school, 
when G — 1, T — 1, S — s, 

pr(G — 1\T — 1) pr(G = l|r = 0,5 = s) 

M ~ pr(G = 1 | T = 1,5 = s) X pr(G - 1 | T - 0) 1 

when G — 0, T — 1, 5 = s, 

pr(G — 0|T = 1) pr(G = 1\T = 0,5 = s) 

" ~ pr(G - 0 | T - 1,5 = s) X pr(G - 1 | T - 0) 1 

when G — 1, T = 0, 5 = s, 
a) — 1; 

when G = 0, T — 0, 5 = s, 

pr(G — 0|T = 0) pr{G = l|r = 0,5 = s) 

“ _ pr{G - 0|T - 0,5 - s) X pr{G - 1 | T - 0) 
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